dOCOHEMT RESUME 



ED 040 020 



RE 002 791 



AUTHOR 
TITLE 
PUB DATE 
NOTE 



EDRS PRICE 
DESCRIPTORS 



Farr, Roger 

The Fallacies of Testing. , 

Mar 70 

lOp. ; Paper presented to the Conference on Reading 
and the National Interest, Bloomington, Ind. , Mar. 
22-24, 1970 

EDRS Price HF-$0.25 HC-$0.60 
Evaluation, Reading Level, *Reading Tests, 
♦Standardized Tests, Test Construction, ^Testing 
Problems, ’'Test Interpretation, Test Results 



ABSTRACT 

Three major points covered by this report are (1) 
What are the demands for reading assessment and how have the demands 
increased?; (2) How adequately do present standardized reading tests 
meet these demands?; and <3) What possible approaches exist for 
developing assessment procedures which meet these demands? An extreme 
interest in finding out how well students are reading exists within 
the educational profession and general public, the author states* 
Standardized tests are being used extensively to determine the 
students* reading levels, but almost all of those tests examined 
showed they are neither able nor designed to meet the demands of the 
decision situations in which they are often being used. In addition, 
many situations exist in which the results of these standardized 
reading tests are being misused and misinterpreted. The author 
concludes with four basic approaches which he believes nay be 
considered to develop assessment procedures that meet the demands for 
accurate measurement of reading achievement. (NH) 



-^002 791 





CATION POSITION ON POLICY 



Ul 



THE FALLACIES OF TESTING 

PRESENTED BY 

DR. ROGER FARR 

DIRECTOR, READING CLINIC 
AND 

CO-EDITOR BEADING RESEARCH QUARTERLY 



READING AND THE NATIONAL INTEREST 



Indiana University 
Bloomington, Indiana 



TO THE 



CONFERENCE 



ON 



Harch 22-24, 1970 



B«£or« I begin ay coninte on reading teats, I want to assure you 
that I as neither attcsiptlng to foster a "Ban the Test ooveaient,'' nor 
claiming that the assignoient of a number to an event Is the end goal of 
any education effort. Rather, 1 believe that we should consider testing 
as only one means of making better instructional decisions. This position 
often brings the question of what can be measured? Aren't there some 
things which are intrinsically lomeasureable? Abraham Kaplan, In his book 
The Conduct of Inquiry, answers this question more cogently than 1: "For 

my part, I answer these questions with an unequivocal 'No.' I would say 
that fdiether we can measure something depends, not on the thing, but on 
how we have conceptualised It, on our knowledge of it, and above all on 
the skill and ingenuity which we can bring to bear on the process of 
measurement which our inquiry can put to use."^ 

I would like to address my remarks today to three major points: 

1. What are the demands for reading assessment and how have 
the demands increased? 

2. How adequately do present standardised reading tests meet these 
demands? 

3. What possible approaches exist for developing assessment pro- 
cedures which meet these demands? 

Demands and reasons for more valid reading assessment 

The plea for more valid measurement of reading behaviors is not new. 
However, the emphasis in Congress on accountability, the attempts by several 
publishing corporations to sell instructional products on a sliding 
cost scale based upon reading gains of children, the national Ri^t to 
Read program, and the targeted research plan of the U.S. Office of Education 
have all contributed to a growing interest in the assessment of reading 

^Abraham Kaplan. The Conduct of Inquiry, (San Francisco: Chandler 

Publishing Company, 1964,) P. 176. 



2 



behaviors* 

In fact, the targeted research program will be developed on the basis 
of criterion tests of reading. The following quote from a recent announce- 
«nt of the targeted research program makes the point quite specifically: 

**The U.S. Office of Education iatend: to support a five phase program 
of research and development on reading to reach the following objectives: 
100 percent of all persons not in pennanent care institutions wist pass, 
by age 10, a criterion- referenced test xdiich is predictive of competent 
perfonsance on a set of adult reading tasks selected to have a favorable 
returns to the individual and to society in general."^ 

Dr. James Allen has also alluded to the development of criterion 
tests in his speeches and comments on the Right to Read program. In a 
recent issue of Family We^lv. ^ Dr. Allen stated that one in four students 
nationwide he* significant reading deficiencies and that up to half of 
the students in large city school systems read below escpectations for their 
age levels. He also stated that among unemployed young people between ages 
16 and 21, about half are functionally illiterate. 

A consideration of statements and efforts like those above has led 
me to the following conclusions: 

1. There exists among both the education profession and the general 
public an extreme desire and interest in finding out how well 
students are reading — in a very functional way. 

2. Standardized reading tests (usually standardized tests developed 
by large publishing companies) are being used extensively to 

^"Research and Development Sources Sought", Commerce Business Daily . 

February 25, 1970. 

2 

*^al for the '70's: To Improve Your Child's Reading: An exclusive 

interview with Dr. James E. Allen, Jr." James C. C. Conniff. Family 
Weekly. March 15, 1970. 



3 



d«t«nBlne hov well etudaote are reading. 

3. Thera are many, many situations in which the results of standard- 
ized reading tests are being misused and misinterpreted. 

Before turning to consideration of how well present standardized 
reading tests meet the needs that I have delineated, X would like to 
define the difference between a criterion test and a standardized or 
norm referenced test. I want to make this distinction clear because 
the testing needs that exist in the field of reading necessitate the use 
of criterion referenced tests, but norm referenced tests are now being 
used to fulfill these needs. 

Criterion referenced tests are very closely related to the old concept 
of a Biastery test, the purpose of such a test Is to measure achievement 
of a very specific behavior and often to make a very specific decision. 

For example, has Bill mastered the skills necessary to drive a car? Is 
Sam able to swim a mile? Or has Jerry mastered the essential beginning 
reading skills necessary to go on to the next phase of instruction? In 
each of these situations, the criterion is quite definite and the student 
is assessed to determine whether he can complete the task. 

A standardized norm referenced test is also concerned with assessing 
behaviors and making decisions, but the dec;^ 9 ion 8 are of a conq>arative 
nature. For exaiq>le, how good a driver is Bill coiLpared with Sam? Is 
Sam an adequate swimmer for his age and size? Or how good is Jerry's 
reading skill development compared to other students at his grade level? 
Another way to consider the basic difference between the two types of 
tests is to consider the anchor point for each test. A norm referenced 
test is usually anchored in the middle of the ability of the group to be 
tested; the test performances will then tend to spread out so conparisons 

o 

ERIC 



4 



CM be nede. A crlterloa referenced test, on the other hand, is anchored 
at one end. The test developer is not Interested in the spread of per- 
formances but rather la how many students are able to perform well enough 
to pass the anchor point. 

As 1 said earlier, criterion referenced tests are needed to make 
the decisions that are being posed by the Right to Read program and the 

R®s®srch effort of the U.S. Office of Education. 1 also indicated 
that the results of standardized reading tests are now being used as the 
data base for making these decisions. 

Bow adequately do standardized reading tests meet these demands ? 

An examination of almost all existing standardized reading tests and 
of the research concerned with these instruments leads to the conclusion 
that these tests are neither able nor were they designed to meet the demands 
of the decision situations for which they are often being used. 

This should not be taken as a blanket rejection of standardized reading 
teats. Givctn a clear understanding of their major purpose i.e. , the com- 
parison of groups and individuals; and knowledge of the limitations of the 
tests, they can be very useful tools. However, they do not satisfy the 
needs I delineated earlier. 

Standardized norm referenced tests do not result in any information 
about what a student can do. For example, we have no basis for deciding 
what a raw score of 121 points, a grade score of 4.2, or a percentile of 
63 means so far as the actual reading tasks a student with such scores 
would he able to perform. We can only use these scores for con^aring the 
student to some norm group. 

Furthermore, it is quite clear that the development of the sub tests 
on almost all standardized reading tests is based upon vague assumptions 



5 



About the roAdlng uct and that them la no clearly defined or cnplrlcally 
aupported evidence to validate the existing aubteata of moat atandardlaed 
reading teata. In £a:t, therii la conalderable evidence that theae exlat- 
Ing aubteata are not valid meaaurea of actual reading aubakllla. The 
aubakllla problem la confounded even more by the very obvloua lack of 
agreement aa to what the actual aubakllla of reading are i»n d how each 
ahould be meaaured. In fact, one could list aeveral hundred different 
reading aubakllla by examining t’»e tltlea of the aubteata of atandardlaed 
reading teats* In addition, an examination of the numerous approaches to 
measuring any one of these hundreds of skills further Increases one's 
doubt aa to what can be measured. While this talk la not Intended to dwell 
on the uses of standardised reading tests, let me at least suggest that 
If you use such tests you can feel quite confident about using the total 
test scores for comparative purposes, but I strongly suggest that you do 
not use the subtest scores for either diagnostic or comparative purposes. 

The preceding statements suggest that most present standardized 
reading tests cannot meet the needs of criterion decisions. The tests 
cannot * 2II us what a student can be expected to do, what skill development 
he needs, or whether he is meeting the objective of basic literacy* Let me 
again enphaslze that most standardized reading tests were not developed to 
meet these needs and the manuals of most standardized reading tests suggest 
that the tests should not be used for these purposes. That cautioni^ however, 
does not seem to deter test consumers from misusing and misinterpreting 
the tests. 

pr®®®nt standardized reading tests do not meet our criteria decision 
®®®^®» approaches exist for developing assessment procedures that 
.successfully; meet these needs? From my thinking on this topic I have 



6 



concluded thet there are four basic approaches which may be considered. 

These are: (1) the development of assessments based on criterion objectives 

that would reflect the reading demands of an effective citisen; <2) the 
vse of average levels of achievement of some age groups as standards of 
achievesMnt; (3) the development of norm referenced tests with more 
specific behavioral objectives built into them; and (4) the development 
of guidelines that could be used to develop sitxxation specific criterion 
meesures. 

Let me expend a bit on each of these approaches. The development of 
assessments based on criterion objectives is exemplified by such tests as 
the New York State Minimum Competency Test in Reading. While I am not 
citing this test as an outstwiding example, it presents some of the 
problems of this approach. The manual of the test states that a score of 
26 correct responses out of a total of 40 multiple choice questions 
based on a series of reading selections is the standard of minimum reading 
competence for a Mew York State high school graduate. 

Another example of this approach is a reading test which I have been 
thinking about since the Federal government delivered my personal income 
tarn forms last Deceod>er. It occurred to me that the personal federal income 
tax form is perhaps the most widely-read reading material that 1 could 
conceptualise. Completion of the forms obviously necessitates a very 
functional kind of reading, but there is also a general introductory section 
at the beginning of the forms which seems to necessitate a more general 
kind of reading. This is the section that discusses why everyone should 
be a good citisen and pay his taxes; it also indicates the uses to which 
the government is going to put the tax money and where all the money is 
going to come from. In addition to the functional reading and general 



7 



reading power needed to complete the form, there are graphs and charts 
to interpret. All in all, I think the federal income tax form might make 
c very excellent criterion referenced test. 

The problems of this first approach are primarily concerned with the 
arbitrariness of deciding on the criterion tasks. The development of the 
tasks for the test would always be arbitrary to a large degree, as they 
were in the two examples I just cited; and many groups would object to 
the definitions of functional literacy implied by the tasks cn the test. 

In addition, the content of the test could be faulted on the grounds that 
it is not representative of basic reading ability needed by adults in our 
society. There would also be concerns raised about the possibility that 
such a test mi^t limit the reading development of children; this would 
be a legitimate concern if schools or teachers wer< satisfied with achiev- 
ing the basic levels of reading competency represented by the test and 
did not try to develop each child to his fullest potential. 

The advantages of this approach would be that the test could be used 
foi making decisions about the content of reading programs. It could also 
be used as the focus for a study of reading subskills by encouraging 
research on discovering the skills necessary to perform adequately on the 
criterion test. Finally, such tests could be used as the bench mark for 
determining the number of functional illiterates in the United States. 

The second approach would involve the adoption of some average level 
of reading achievement for an arbitrarily-chosen age-level as the criterion 
for ^a8ic literacy. For example, we could decide that the definition of 
functional literacy is the reading score the average 15-year-old achieves 
on a test of general reading achievement, such as the total score on a 
standard reading test. This approach has the same disadvantages as 



8 



traditional atandardlsed reading testa. There are no clear cut objectives 
built Into the test and there Is no reference for Interpreting the reading 
performance. The advantage of this approach Is that It would be quite easy 
to develop and would result In an Inmedlate criterion for making decisions 
about literacy levels. 

The third approach would be the development of clearly- stated behavioral 
objectives within a norm referenced reading measure. This approach would 
partially combine the first two approaches I have Just described. It would 
result In a testing instrument which would Include criterion references as 
well as norm references. For example, an el^th grade student might get 
a raw score of 121 points on the test. This might be Interpreted to mean 
that he has the necessary functional reading skills to complete his personal 
Income tax forms and also that he Is reading as well as the average ninth 
grader In his second semester, or a grade score of 9.7. 

The problems of developing such a test would be monumental. I do think 
we have the measurement knowledge and ingenuity to produce such a test, but 
I jto not believe we know enough about reading behaviors to J)e very success- 
ful. Much research Is needed before we can go further. 1 think the first 
thing we need to know Is what the reading demands of particular economic 
groups are. A doctoral student here at Indiana Is now developing a study 
in which he Intends to examine the reading demand of specified occupations 
and whether the employees in that occupation have the necessary reading 
skills for the reading demands of their jobs. He also Intends to study 
the non-c ccupatlon reading habits of these people. X think that a series 
of such studies would provide much needed Information for deciding on the 
criterion objectives for the makeup of literacy tests. 



9 



Another avenue of research leading to the developoent of this third 
approach is siore extensive observations of the reading behaviors of sub* 
Jects engaged in reading. These need to be conducted for a variety of 
reading tests under a variety of conditions. Some of the work of psycho- 
linguists • such as Ken Goodman of Wa3me State University^ offer initial 

V 

f 

leads along this line. 

**If” we are to develop criteria tests, I would like to see them 
developed so that they met the definition of this third approach and that 
they be based on more systematic study of reading behaviors. However, my 
fourth approach suggests that it smy not be feasible or logical to attempt 
to develop national criterion tests in reading. It may be that each 
situation HAiimiMia the need for its own criterion teat. Varying socio- 
economic, geographical or community objectives may block the development 
of national criteria for basic literacy. If this is the case, we need 
not be isDobilised. He could develop guidelines and training programs 
for the development of situaticn-specific criterion tests. These guide- 
lines would cover such topics as defining behaviors, identifying goals, 
developing behavioral objectives, sampling behaviors, and test analysis 
concepts. 

Bach of the four procedures which I have briefly explored offers a 
possibility for meeting the reading assessment needs of the nation. There 
are limitations and problems iidierent in each approach; the approaches 
that seem to offer the best alternative involve the most extensive effort, 
but if we are sincerely dedicated to the Right to Bead Program, we need 
to face the assessment problem at the outset. Facing this problem, will 
not only tell us where we now are, but will also force us to consider 
idiere we want to go. 




10 



Su—trv ; 

I ^iDuld like to manaerlse briefly my three nejor points. First « I 
thlidc there exists s vltsl need for vslld criterion referenced aessures 
in reeding. In order to oiske vslld decisions we need vslld sssessment 
dsts. Second, there sre slnost no testing Instruaents svsllsble todsy 
which csn fulfill these needs* Stsndsrdlsed reeding tests sre presently 
beiiu; sdsused for these needs end this Is lending to sons rsther unfortun* 
ste conclusions end decisions In the field of reeding. Third, 1 thlidc 
there sre severs! spprosches thst csn lend to the developaent of the kind 
of tests ve need. 

A conslderstlon snd study of these spproscher end sny others thst 
sight be sdded to ay list of four should be given Isnedlste sttentlon snd 
top priority st the beginning of the Right to Reed effort. If this Is not 
dons there will be little aesns of directing efforts towsrd whet we went 
to sccoopllsh, snd no wsy of knowing whether we schleve our gosls. 




